Search | WHO COVID-19 Research Database

Unsupervised cluster analysis of SARS-CoV-2 genomes indicates that recent (June 2020) cases in Beijing are from a genetic subgroup that consists of mostly European and South(east) Asian samples, of which the latter are the most recent

Hahn, Georg, Cho, Michael H.; Weiss, Scott T.; Silverman, Edwin K.; Lange, Christoph.

[Unspecified Source]; 2020.

Non-conventional in English | [Unspecified Source] | ID: grc-750459

ABSTRACT

Research efforts of the ongoing SARS-CoV-2 pandemic have focused on viral genome sequence analysis to understand how the virus spread across the globe. Here, we assess three recently identified SARS-CoV-2 genomes in Beijing from June 2020 and attempt to determine the origin of these genomes, made available in the GISAID database. The database contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Including the three new samples and excluding samples with missing annotations, we analyzed 7, 643 SARS-CoV-2 genomes. Using principal component analysis computed on a similarity matrix that compares all pairs of the SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index, we find that the newly discovered virus genomes from Beijing are in a genetic cluster that consists mostly of cases from Europe and South(east) Asia. The sequences of the new cases are most related to virus genomes from a small number of cases from China (March 2020), cases from Europe (February to early May 2020), and cases from South(east) Asia (May to June 2020). These findings could suggest that the original cases of this genetic cluster originated from China in March 2020 and were re-introduced to China by transmissions from samples from South(east) Asia between April and June 2020.

Genome-wide association analysis of COVID-19 mortality risk in SARS-CoV-2 genomes identifies mutation in the SARS-CoV-2 spike protein that colocalizes with P.1 of the Brazilian strain.

Hahn, Georg; Wu, Chloe M; Lee, Sanghun; Lutz, Sharon M; Khurana, Surender; Baden, Lindsey R; Haneuse, Sebastien; Qiao, Dandi; Hecker, Julian; DeMeo, Dawn L; Tanzi, Rudolph E; Choudhary, Manish C; Etemad, Behzad; Mohammadi, Abbas; Esmaeilzadeh, Elmira; Cho, Michael H; Li, Jonathan Z; Randolph, Adrienne G; Laird, Nan M; Weiss, Scott T; Silverman, Edwin K; Ribbeck, Katharina; Lange, Christoph.

Genet Epidemiol ; 45(7): 685-693, 2021 10.

Article in English | MEDLINE | ID: covidwho-1279364

ABSTRACT

SARS-CoV-2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is poorly understood. Starting in October 2020, using the methodology of genome-wide association studies (GWAS), we looked at the association between whole-genome sequencing (WGS) data of the virus and COVID-19 mortality as a potential method of early identification of highly pathogenic strains to target for containment. Although continuously updating our analysis, in December 2020, we analyzed 7548 single-stranded SARS-CoV-2 genomes of COVID-19 patients in the GISAID database and associated variants with mortality using a logistic regression. In total, evaluating 29,891 sequenced loci of the viral genome for association with patient/host mortality, two loci, at 12,053 and 25,088 bp, achieved genome-wide significance (p values of 4.09e-09 and 4.41e-23, respectively), though only 25,088 bp remained significant in follow-up analyses. Our association findings were exclusively driven by the samples that were submitted from Brazil (p value of 4.90e-13 for 25,088 bp). The mutation frequency of 25,088 bp in the Brazilian samples on GISAID has rapidly increased from about 0.4 in October/December 2020 to 0.77 in March 2021. Although GWAS methodology is suitable for samples in which mutation frequencies varies between geographical regions, it cannot account for mutation frequencies that change rapidly overtime, rendering a GWAS follow-up analysis of the GISAID samples that have been submitted after December 2020 as invalid. The locus at 25,088 bp is located in the P.1 strain, which later (April 2021) became one of the distinguishing loci (precisely, substitution V1176F) of the Brazilian strain as defined by the Centers for Disease Control. Specifically, the mutations at 25,088 bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Since the mutations alter amino acid coding sequences, they potentially imposing structural changes that could enhance viral infectivity and symptom severity. Our analysis suggests that GWAS methodology can provide suitable analysis tools for the real-time detection of new more transmissible and pathogenic viral strains in databases such as GISAID, though new approaches are needed to accommodate rapidly changing mutation frequencies over time, in the presence of simultaneously changing case/control ratios. Improvements of the associated metadata/patient information in terms of quality and availability will also be important to fully utilize the potential of GWAS methodology in this field.

Subject(s)

COVID-19 , Spike Glycoprotein, Coronavirus , Brazil , Genome-Wide Association Study , Humans , Mutation , Phylogeny , SARS-CoV-2 , Spike Glycoprotein, Coronavirus/genetics

Machine Learning and Prediction of All-Cause Mortality in COPD.

Moll, Matthew; Qiao, Dandi; Regan, Elizabeth A; Hunninghake, Gary M; Make, Barry J; Tal-Singer, Ruth; McGeachie, Michael J; Castaldi, Peter J; San Jose Estepar, Raul; Washko, George R; Wells, James M; LaFon, David; Strand, Matthew; Bowler, Russell P; Han, MeiLan K; Vestbo, Jorgen; Celli, Bartolome; Calverley, Peter; Crapo, James; Silverman, Edwin K; Hobbs, Brian D; Cho, Michael H.

Chest ; 158(3): 952-964, 2020 09.

Article in English | MEDLINE | ID: covidwho-987243

ABSTRACT

BACKGROUND: COPD is a leading cause of mortality. RESEARCH QUESTION: We hypothesized that applying machine learning to clinical and quantitative CT imaging features would improve mortality prediction in COPD. STUDY DESIGN AND METHODS: We selected 30 clinical, spirometric, and imaging features as inputs for a random survival forest. We used top features in a Cox regression to create a machine learning mortality prediction (MLMP) in COPD model and also assessed the performance of other statistical and machine learning models. We trained the models in subjects with moderate to severe COPD from a subset of subjects in Genetic Epidemiology of COPD (COPDGene) and tested prediction performance in the remainder of individuals with moderate to severe COPD in COPDGene and Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE). We compared our model with the BMI, airflow obstruction, dyspnea, exercise capacity (BODE) index; BODE modifications; and the age, dyspnea, and airflow obstruction index. RESULTS: We included 2,632 participants from COPDGene and 1,268 participants from ECLIPSE. The top predictors of mortality were 6-min walk distance, FEV1 % predicted, and age. The top imaging predictor was pulmonary artery-to-aorta ratio. The MLMP-COPD model resulted in a C index ≥ 0.7 in both COPDGene and ECLIPSE (6.4- and 7.2-year median follow-ups, respectively), significantly better than all tested mortality indexes (P < .05). The MLMP-COPD model had fewer predictors but similar performance to that of other models. The group with the highest BODE scores (7-10) had 64% mortality, whereas the highest mortality group defined by the MLMP-COPD model had 77% mortality (P = .012). INTERPRETATION: An MLMP-COPD model outperformed four existing models for predicting all-cause mortality across two COPD cohorts. Performance of machine learning was similar to that of traditional statistical methods. The model is available online at: https://cdnm.shinyapps.io/cgmortalityapp/.

Subject(s)

Machine Learning , Pulmonary Disease, Chronic Obstructive/mortality , Cause of Death , Female , Humans , Male , Middle Aged , Predictive Value of Tests , Respiratory Function Tests

Hahn, Georg; Cho, Michael H; Weiss, Scott T; Silverman, Edwin K; Lange, Christoph.

bioRxiv ; 2020 Jun 30.

Article in English | MEDLINE | ID: covidwho-638106

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL